Trino: Add Trino Docker Compose for integration testing #2220

dingo4dev · 2025-07-18T11:10:24Z

Rationale for this change

This pull request introduces a Trino service to the Docker Compose development environment, addressing the need for broader integration testing capabilities as discussed in issue #2219.

Key additions:

Docker Compose configuration for Trino with Iceberg support
Support for both Hive Metastore and REST catalog types
Integration test infrastructure for UUID partition testing with Trino
Helper script (dev/run-trino.sh) for easier Trino testing workflows

Background:
Trino provides an alternative execution engine for testing Iceberg features, particularly important for testing UUID partitions which require Java Iceberg 1.9.2+ (not yet available in Spark at the time of this PR). This expands our integration testing capabilities beyond Spark.

Are these changes tested?

Yes, this PR includes test coverage:

Docker Compose setup: Added dev/docker-compose-trino.yml with Trino service configuration
Catalog configurations:
- dev/trino/catalog/warehouse_hive.properties - Hive Metastore catalog
- dev/trino/catalog/warehouse_rest.properties - REST catalog
Integration tests:
- Added UUID partition tests in tests/integration/test_writes/test_writes.py
- Added Trino fixtures in tests/conftest.py
- Added REST catalog test in tests/integration/test_rest_catalog.py
CI validation: All existing CI checks pass, validating compatibility with the existing test infrastructure
Manual testing: Can be tested locally via make run-trino or ./dev/run-trino.sh

Test structure:

Tests use @pytest.mark.trino marker to identify Trino-specific tests
Proper skip conditions when Spark doesn't support certain features (e.g., BucketTransform with UUID)

Are there any user-facing changes?

For contributors/developers:

Yes, this adds new development tooling:

New Make target: make run-trino - Starts Trino Docker environment
Helper script: dev/run-trino.sh - Convenience script for Trino testing
Test fixtures: Trino catalog fixtures available for integration tests
Documentation: Configuration files serve as examples for Trino + Iceberg setup

For end users:

No. This is purely a development/testing infrastructure change. It does not affect the pyiceberg library API or behavior.

Copilot

Pull Request Overview

This PR introduces Trino integration testing capabilities to the PyIceberg project by adding Docker Compose configuration and test fixtures for Trino services. The changes enable testing of Iceberg operations through Trino's SQL interface to ensure compatibility and proper integration between PyIceberg and Trino.

Key changes include:

Added Trino Docker services configuration with both REST and Hive catalog support
Introduced new test fixtures for Trino database connections
Created integration tests that verify PyIceberg operations work correctly with Trino

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`tests/integration/test_writes/test_writes.py`	Added new Trino-specific UUID partitioning test and skipped existing Spark test
`tests/integration/test_rest_catalog.py`	Added test to verify Iceberg namespace appears as Trino schema
`tests/integration/test_register_table.py`	Added test to verify table registration works with Trino
`tests/conftest.py`	Added Trino connection fixtures and command-line options
`pyproject.toml`	Added Trino dependency and integration test marker
`dev/trino/catalog/*.properties`	Trino catalog configuration files for REST and Hive catalogs
`dev/run-trino.sh`	Shell script to manage Trino Docker services
`dev/docker-compose-trino.yml`	Standalone Trino Docker Compose configuration
`dev/docker-compose-integration.yml`	Added Trino service to main integration environment
`Makefile`	Added target for running Trino integration tests

Comments suppressed due to low confidence (2)

tests/integration/test_register_table.py:109

[nitpick] The test doesn't handle the case where Trino connection might fail or timeout. Consider adding error handling or retries for database connectivity issues that could make the test flaky.

    assert table_name in inspect(trino_conn).get_table_names(schema=namespace)

tests/integration/test_rest_catalog.py

raulcd

Thank for the PR!

raulcd · 2025-11-14T08:44:31Z

tests/integration/test_writes/test_writes.py

    assert tbl.scan().to_arrow() == arrow_table


+@pytest.mark.skip("UUID BucketTransform is not supported in Spark Iceberg 1.9.2 yet")


is this test skip related to the trino work?
I am curious why this wasn't skipped before

@raulcd, Thanks for your reviews.

Before this PR, the bucket transform is not test in this testcase because we found the issue in Iceberg Java.

iceberg-python/tests/integration/test_writes/test_writes.py

Lines 2093 to 2102 in f2131e6

@pytest.mark.parametrize(

"transform",

[

IdentityTransform(),

# Bucket is disabled because of an issue in Iceberg Java:

# https://github.com/apache/iceberg/pull/13324

# BucketTransform(32)

],

)

def test_uuid_partitioning(session_catalog: Catalog, spark: SparkSession, transform: Transform) -> None: # type: ignore

My thoughts is using Trino to test this uuid partition testcase for Identity & Bucket Transforms, instead of using Spark Iceberg to test it, because the UUID Bucket Transform is not supported in Spark but it is worked in Trino.

To isolate the current issue in Java Iceberg, I added new testcase for it in: https://github.com/dingo4dev/iceberg-python/blob/3eef2566e02342d65a2c2f650bfb6c62d0221cf3/tests/integration/test_writes/test_writes.py#L2161

I see, thanks for the comment. I understand you are adding a new test that is covering both the BucketTransform and the IdentityTransform on test_uuid_partitioning_with_trino but I don't think adding the conditional skip on this specific test is necessary. Either we leave it as is, the test will still run on non-trino set up or we remove it entirely as the transform is covered on the trino setup. I would rather just maintain it to be honest because if the Iceberg Java issue is fixed in the future we can just also add the BucketTransform to run unconditionally.

@raulcd You're right! I just found that is a way to add the skip marker to specific parameter, so that we can keep the partition test running on the Spark, and also able to skip the BucketTransform and easy to tracked it instead of using comments, tho. How does that sound?

sounds good in principle, if you can push or show the diff I might understand better what you mean :)
Thanks for looking at this!

Sure ! I just updated it. Please have a look ;)

The change lgtm! Thanks, I'll see if I have some time to go over the changes in detail to properly review but we will still need a committer review :)

ebyhr · 2025-11-29T08:05:25Z

dev/trino/config.properties

@@ -0,0 +1,23 @@
+# Licensed to the Apache Software Foundation (ASF) under one


Why do we want to add config.properties‎ file?

Hi @ebyhr , Thanks for your review.
My first initial thought is to configure the catalog.management to dynamic which help to add and test the glue catalog and dynamo catalog without restart the container. Then, I though adding it will help development experience, you can modify server configure for your own resources without update dockerfile yourself

Thank you, I understand the motivation now. The next question is why enabling CATALOG_MANAGEMENT environment variable is insufficient.

I added catalog.management=${ENV:CATALOG_MANAGEMENT} to make sure it consistence while someone updating config.properties explicitly. Or we can just use catalog.management=dynamic and remove the env var.

ebyhr · 2025-11-29T08:06:01Z

dev/docker-compose-integration.yml

+    environment:
+      - CATALOG_MANAGEMENT=dynamic


This environment variable looks redundant since we're adding catalogs with properties files instead of CREATE CATALOG statement.

ebyhr · 2025-11-29T08:14:27Z

dev/trino/catalog/warehouse_rest.properties

+s3.aws-access-key=admin
+s3.aws-secret-key=password
+s3.endpoint=http://minio:9000
+s3.path-style-access=false


nit: We could remove this s3.path-style-access=false. It's disabled by default.

ebyhr · 2025-12-03T02:32:14Z

dev/docker-compose-integration.yml

+      - rest
+      - hive
+    volumes:
+      - ./trino/catalog:/etc/trino/catalog


nit: How about specifying the full path of properties files? Thus, we can use existing catalogs such memory and TPCH catalogs - they are helpful during development.

Sounds great!

kevinjqliu

Thanks for working on this! Its great to see another engine integration.

A couple of general comments:

Make Trino infra and integration tests optional for now and hook it up to the CI later
Consolidate the 3 different ways to spin up trino infra. My preference would be to create a new docker-compose-trino with just the trino infra and spin up that alongside the original dev/docker-compose-integration.yml.
Group all the infra related changes to dev/trino/, similar to what we do for spark
Create a separate test file for trino tests

Let me know what you think!

Address feedback from ebyhr on PR apache#2220 discussion r2583421945. Instead of mounting the entire catalog directory, mount individual catalog property files. This allows Trino to use its built-in catalogs (memory, TPCH) which are helpful during development, while still providing our custom Iceberg catalogs. Added: - memory.properties - In-memory connector for quick testing - tpch.properties - TPC-H benchmark data for development This makes the Trino container more flexible and developer-friendly.

Address feedback from ebyhr on PR apache#2220 discussion r2583421945. Instead of mounting the entire catalog directory, mount individual catalog property files. This allows Trino to preserve its built-in catalogs (memory, TPCH) which are helpful during development, while still providing our custom Iceberg catalogs. Mounted files: - warehouse_rest.properties - REST catalog configuration - warehouse_hive.properties - Hive catalog configuration - config.properties - Trino server configuration

Adds a Docker Compose configuration for setting up Trino with Iceberg, including support for Hive Metastore and REST catalog types. This allows for easier testing and development with Trino and Iceberg. closes apache#2219 add uuid partitions test with trino Add Trino as alternative tool for test uuid partitions as Java Iceberg 1.9.2 in spark is not yet supported.  fix: correct conditions in namespace fix: add license to trino config file remove precommit as prek exist use mark to skip BucketTransform in Spark uuid partition test Trino: Restructure to make Trino integration optional and modular Address reviewer feedback from kevinjqliu: 1. Consolidated Trino infrastructure: - All Trino config files remain in dev/trino/ directory - docker-compose-trino.yml moved to dev/ (alongside integration compose) - run-trino.sh moved to dev/ (alongside other run scripts) 2. Removed Trino from main integration docker-compose: - Trino service removed from dev/docker-compose-integration.yml - Trino can now be spun up separately alongside main integration - Keeps Trino testing optional and not part of CI 3. Created dedicated test file: - tests/integration/test_trino.py for all Trino-specific tests - Moved test_schema_exists_in_trino from test_rest_catalog.py - Moved test_uuid_partitioning_with_trino from test_writes.py - Better separation of concerns and easier to maintain 4. Simplified pytest marker: - Changed from @pytest.mark.integration_trino to @pytest.mark.trino - Updated Makefile target: test-integration-trino -> test-trino - Updated pyproject.toml and conftest.py references This makes Trino integration testing opt-in and follows the same pattern as other optional test suites (s3, adls, gcs). Trino: Mount catalog files individually to preserve built-in catalogs Address feedback from ebyhr on PR apache#2220 discussion r2583421945. Instead of mounting the entire catalog directory, mount individual catalog property files. This allows Trino to preserve its built-in catalogs (memory, TPCH) which are helpful during development, while still providing our custom Iceberg catalogs. Mounted files: - warehouse_rest.properties - REST catalog configuration - warehouse_hive.properties - Hive catalog configuration - config.properties - Trino server configuration make lint

dingo4dev force-pushed the test-trino-integration branch 4 times, most recently from 04d1f61 to a0e1a11 Compare July 22, 2025 19:01

dingo4dev marked this pull request as ready for review July 22, 2025 19:02

Copilot AI review requested due to automatic review settings July 22, 2025 19:02

dingo4dev changed the title ~~[WIP] Trino: Add Trino Docker Compose for integration testing~~ Trino: Add Trino Docker Compose for integration testing Jul 22, 2025

Copilot AI reviewed Jul 22, 2025

View reviewed changes

tests/integration/test_rest_catalog.py Outdated Show resolved Hide resolved

dingo4dev force-pushed the test-trino-integration branch from df48ce0 to f46d70f Compare August 28, 2025 19:02

dingo4dev force-pushed the test-trino-integration branch from f46d70f to 06f7419 Compare November 12, 2025 17:11

raulcd reviewed Nov 14, 2025

View reviewed changes

ebyhr reviewed Nov 29, 2025

View reviewed changes

ebyhr reviewed Dec 3, 2025

View reviewed changes

kevinjqliu reviewed Dec 27, 2025

View reviewed changes

dingo4dev force-pushed the test-trino-integration branch from d0f30bf to 774ec5f Compare February 1, 2026 04:18

add trino to ci job and fix make test problem

72fa3c6

dingo4dev force-pushed the test-trino-integration branch from 6db859b to 72fa3c6 Compare February 1, 2026 05:44

		assert tbl.scan().to_arrow() == arrow_table


		@pytest.mark.skip("UUID BucketTransform is not supported in Spark Iceberg 1.9.2 yet")

	@pytest.mark.parametrize(
	"transform",
	[
	IdentityTransform(),
	# Bucket is disabled because of an issue in Iceberg Java:
	# https://github.com/apache/iceberg/pull/13324
	# BucketTransform(32)
	],
	)
	def test_uuid_partitioning(session_catalog: Catalog, spark: SparkSession, transform: Transform) -> None: # type: ignore

		@@ -0,0 +1,23 @@
		# Licensed to the Apache Software Foundation (ASF) under one

Trino: Add Trino Docker Compose for integration testing #2220

Are you sure you want to change the base?

Trino: Add Trino Docker Compose for integration testing #2220

Conversation

dingo4dev commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

raulcd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dingo4dev Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebyhr Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dingo4dev commented Jul 18, 2025 •

edited

Loading

dingo4dev Dec 3, 2025 •

edited

Loading

ebyhr Dec 3, 2025 •

edited

Loading